1 Introduction

This package is based on the code of Dr. Villette and Dr. Larsen. The package is written and maintained by Dr. Villette. This package is meant to facilitate microbiome exploration and ensuring nice plotting.
This package covers :

  • The dada2 pipeline with wrapper functions that ease the processing of multiple projects

  • Some plotting functions for beta diversity, heatmap and differential abundance analysis using directly a phyloseq object

  • A pipeline for IgASeq analysis

2 Trim, denoise and align your sequences using dada2 pipeline

For convenience this will not be a reproducible example, dada2 takes too long to compute and knit. This part of the tutorial will present a run that we performed in house. The rest of the tutorial will be based on reproducible data.

2.1 Get the files

We use here a wrapper function that will create a list of three for pair end :

  • forward files

  • reverse files

  • names of the files

And for single end : -

  • a list of files

  • names of the files


f_list = list_fastq("/home/bigbeast/Documents/tmp/2022-12 CIMMAP run ELISE", 
                    pattern = c("R1", "R2"), separator = "_", level = 1)

# check that all files are distinct

lapply(f_list, duplicated)

# check that all files exist5
lapply(f_list, file.exists)
random=  lapply(f_list, "[",sample(1:255,30))
# tmp= summarise_fastq(random, cores =30, plot = F )

2.2 Check the quality profile

We will use the function qc_check. This will take time as the plotQualityProfile isn’t parallelized in dada2. This function will create two plots (for pair end) of n aggregated samples and only one plot if you are using single end.

qc_check(flist, n=30)

2.3 Check the best trimming parameters for your set of fastq

2.3.1 Test optimal cutting param

set.seed(1)
tmp = lapply(f_list, "[",sample(1:255,30))
tmp_list= filt_list(tmp)
filtered= list()

cb= combn(x= (300-seq(0, 50, by=10)), m=2)
# Very long be careful
#try to test combination of trimmings
for(i in 1:dim(cb)[2]){
  filtered[[i]]= filter_fastq(tmp, tmp_list, cutting_param = cb[ ,i], cores = 35, trimleft = 35, maxEE = c(3,4))
}
#extract the data to a df 
per= NULL
for(i in 1:15) {
t =  filtered[[i]] %>% 
  as.data.frame() %>%
  mutate(per=reads.out/reads.in)
per= cbind(per, t$per) %>% 
  as.data.frame()
# per$names=rownames(tmp[1])
print(per)
}

colnames(per)= paste(cb[1,], cb[2,]) 
per$names=rownames(t[1])

# plot 

png("transmic and IgA rescue mice percent reads passed depending on cutting parameters.png", width = 600, height=400)
per %>% 
  as.data.frame() %>%
  pivot_longer(names_to = "cut", values_to = "per", cols = 1:15) %>% 
  ggplot(aes(y=per, x= cut))+
  geom_boxplot()+ geom_line(aes(group=names, col=names))+
  labs(y= "percentage passed", x= "Trimming parameters")+
  theme(axis.text.x = element_text(angle=90), legend.position = "none")
 

dev.off()

2.3.2 Test optimal maxEE

cb= combn(x= (7-seq(0,5, by=1)), m=2)
cb =  cb[nrow(cb):1,]
# Very long be careful
#try to test combination of trimmings
# registerDoParallel(cl = makeCluster(5))
for(i in 1:dim(cb)[2]){
  filtered[[i]]= filter_fastq(tmp, tmp_list, cutting_param = c(260,250),
                              cores = 35, trimleft = 35, maxEE = cb[ ,i])
}
#extract the data to a df 
per= NULL
for(i in 1:15) {
t =  filtered[[i]] %>% 
  as.data.frame() %>%
  mutate(per=reads.out/reads.in)
per= cbind(per, t$per) %>% 
  as.data.frame()
}

colnames(per)= paste(cb[1,], cb[2,]) 
per$names=rownames(t[1])

# plot 

png("transmic and IgA rescue mice percent reads passed depending on errors parameters.png", width = 600, height=400)
per %>% as.data.frame() %>% pivot_longer(names_to = "cut", values_to = "per", cols = 1:15) %>% 
  ggplot(aes(y=per, x= cut))+
  geom_boxplot()+ geom_line(aes(group=names, col=names))+
  labs(y= "percentage passed", x= "Trimming parameters")+
  theme(axis.text.x = element_text(angle=90), legend.position = "none")
dev.off()

2.4 Filter and trim the fastq

We now have to remove the bad quality reads and trim the length. You will find a function to create the list of filtered files and one to make the filtered files.

filt= filt_list(f_list) # create the list of filtered files

filtered = filterAndTrim(fwd= fwd, filt = filtFs, rev=rv, filt.rev = filtRs, truncLen = c(260,240), trimLeft = 25, maxEE = c(3,5), multithread = 45)

3 Analyse your 16S / metabolomic data

3.1 Phyloseq object oriented functions

We will use the enterotype data to explore some of the plotting functions. Let’s start with the beta diversity functions beta_diversity and beta_dispersion.

data(enterotype)

3.1.1 Alpha diversity plots

Alpha diversity is an important facet of microbiome analysis. I’ve created wrapper function to plot either alpha diversity as boxplots or as line plots.

alpha_diversity(enterotype, measure="Shannon", x="Enterotype", group="SeqTech", plot_type="boxplot")

alpha_diversity(enterotype, measure="Shannon", x="Enterotype", group="SeqTech", plot_type="line")

These plots are compatible with other aspects of ggplot, such as facets or stats. Stats are also implemented in this function but are very sparse.

alpha_diversity(enterotype, measure="Shannon", x="Enterotype", group="SeqTech", plot_type="boxplot", stat = T)

library(ggpubr)
enterotype %>% 
  subset_samples(Enterotype!="NA")%>%
alpha_diversity( measure="Shannon", x="Enterotype", group="Enterotype", plot_type="boxplot")+
  stat_compare_means(comparisons = list(c("1", "2")))+
  facet_grid(~SeqTech)

3.1.2 Beta diversity plots

You will have the choice between beta_diversity and beta_dispersion for your beta diversity plotting.

beta_dispersion will plot PCoA, NMDS, PCA, DCA, CA and t-SNE for the moment. This function will plot the two components of your choosing, confidence ellipses and boxplot for each axis and for each group. Each function will return a plot and a percentage of contribution for each component.

beta_diversity will be removed progressively from this package. This function is redundant with beta_dispersion.

beta_diversity(enterotype, dist="bray", method="PCoA", group="SeqTech", permanova = F)


beta_dispersion(enterotype, dist = "bray", method = "PCoA", group = "SeqTech")

#> $reduction
#> Call: vegan::wcmdscale(d = d, eig = T)
#> 
#>           Inertia Rank
#> Total       57.45     
#> Real        69.09  114
#> Imaginary  -11.63  165
#> 
#> Results have 280 points, 114 axes
#> 
#> Eigenvalues:
#>   [1] 31.376  8.067  5.376  2.497  2.208  1.753  1.385  1.342  0.967  0.865
#>  [11]  0.779  0.695  0.648  0.543  0.517  0.473  0.450  0.428  0.410  0.382
#>  [21]  0.368  0.357  0.334  0.310  0.303  0.291  0.283  0.272  0.242  0.231
#>  [31]  0.216  0.208  0.204  0.189  0.180  0.175  0.163  0.156  0.150  0.146
#>  [41]  0.138  0.132  0.128  0.125  0.121  0.119  0.112  0.107  0.101  0.097
#>  [51]  0.093  0.089  0.083  0.080  0.079  0.073  0.072  0.068  0.065  0.064
#>  [61]  0.060  0.059  0.055  0.054  0.051  0.049  0.048  0.044  0.043  0.041
#>  [71]  0.041  0.039  0.037  0.035  0.034  0.033  0.031  0.030  0.030  0.028
#>  [81]  0.025  0.024  0.024  0.023  0.021  0.020  0.020  0.018  0.016  0.015
#>  [91]  0.014  0.013  0.013  0.012  0.011  0.011  0.010  0.009  0.009  0.008
#> [101]  0.007  0.006  0.005  0.005  0.004  0.004  0.003  0.003  0.002  0.001
#> [111]  0.001  0.001  0.000  0.000  0.000 -0.001 -0.001 -0.001 -0.002 -0.002
#> (Showing 120 of 279 eigenvalues)
#> 
#> Weights: Constant
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>           Df SumOfSqs      R2      F Pr(>F)    
#> Model      2   30.538 0.53153 157.14  0.001 ***
#> Residual 277   26.915 0.46847                  
#> Total    279   57.453 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# beta_dispersion()

3.1.2.1 Multiple dimension reductions

There are unconstrained and constrained approaches to beta diversity. Unconstrained analysis, more common in literature, are either based on:

  • dissimilarity matrix (PCoA, NMDS, DCA)
  • directly on the raw matrix (PCA, CA)
  • t-SNE that can use both dissimilarity matrix or the raw matrix
3.1.2.1.1 Unconstrained models

You can play with the parameters of each function like the following plots :

3.1.2.1.1.1 PCoA
beta_dispersion(enterotype, dist = "bray", method = "PCoA", group = "SeqTech",
                color_vector = c("#777711", "#117777", "#DD7788"),
                legend_title = "Sequencing tech", lwd = 2, 
                font = 2, draw = "polygon", text = T, permanova = T, 
                y.intersp = 0.7)

#> $reduction
#> Call: vegan::wcmdscale(d = d, eig = T)
#> 
#>           Inertia Rank
#> Total       57.45     
#> Real        69.09  114
#> Imaginary  -11.63  165
#> 
#> Results have 280 points, 114 axes
#> 
#> Eigenvalues:
#>   [1] 31.376  8.067  5.376  2.497  2.208  1.753  1.385  1.342  0.967  0.865
#>  [11]  0.779  0.695  0.648  0.543  0.517  0.473  0.450  0.428  0.410  0.382
#>  [21]  0.368  0.357  0.334  0.310  0.303  0.291  0.283  0.272  0.242  0.231
#>  [31]  0.216  0.208  0.204  0.189  0.180  0.175  0.163  0.156  0.150  0.146
#>  [41]  0.138  0.132  0.128  0.125  0.121  0.119  0.112  0.107  0.101  0.097
#>  [51]  0.093  0.089  0.083  0.080  0.079  0.073  0.072  0.068  0.065  0.064
#>  [61]  0.060  0.059  0.055  0.054  0.051  0.049  0.048  0.044  0.043  0.041
#>  [71]  0.041  0.039  0.037  0.035  0.034  0.033  0.031  0.030  0.030  0.028
#>  [81]  0.025  0.024  0.024  0.023  0.021  0.020  0.020  0.018  0.016  0.015
#>  [91]  0.014  0.013  0.013  0.012  0.011  0.011  0.010  0.009  0.009  0.008
#> [101]  0.007  0.006  0.005  0.005  0.004  0.004  0.003  0.003  0.002  0.001
#> [111]  0.001  0.001  0.000  0.000  0.000 -0.001 -0.001 -0.001 -0.002 -0.002
#> (Showing 120 of 279 eigenvalues)
#> 
#> Weights: Constant
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>           Df SumOfSqs      R2      F Pr(>F)    
#> Model      2   30.538 0.53153 157.14  0.001 ***
#> Residual 277   26.915 0.46847                  
#> Total    279   57.453 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.1.2.1.1.2 t-SNE
beta_dispersion(enterotype, dist = "bray", method = "tsne", group = "SeqTech",
                color_vector = c("#777711", "#117777", "#DD7788"),
                legend_title = "Sequencing tech without boxplots", lwd = 2, 
                font = 2, draw = "polygon", text = T, permanova = T, 
                y.intersp = 0.7,
                boxplot = F, where = "bottom")

#> $reduction
#> $N
#> [1] 280
#> 
#> $Y
#>               [,1]       [,2]
#>   [1,] -0.08402678 -14.489454
#>   [2,] -0.54750203 -13.809009
#>   [3,] -1.82478383 -14.755933
#>   [4,] -0.06821868 -14.467274
#>   [5,]  1.97026209 -19.348574
#>   [6,]  0.41635562 -19.687455
#>   [7,] -7.05393332 -18.784530
#>   [8,] -5.59049099 -23.130444
#>   [9,] -5.96256394 -22.921497
#>  [10,] -4.89538333 -22.854616
#>  [11,] -6.48087334  -7.842522
#>  [12,]  3.47081834 -18.874737
#>  [13,]  3.42235248 -17.884296
#>  [14,] -4.85651732 -15.804279
#>  [15,] -3.30183631 -16.409896
#>  [16,] -7.34523394 -18.568468
#>  [17,] -6.42980260 -10.317381
#>  [18,] -3.07056440 -14.013133
#>  [19,] -0.48074417 -16.770656
#>  [20,] -0.23323519 -17.710207
#>  [21,] -2.76930815 -14.301993
#>  [22,] -4.13480760 -12.635360
#>  [23,] -4.25933909 -11.718129
#>  [24,] -2.29766837 -19.091395
#>  [25,] -4.17630339 -11.861561
#>  [26,]  4.12419628 -19.109204
#>  [27,] -4.56411513 -19.341799
#>  [28,] -0.37764288 -17.098845
#>  [29,] -6.24206625  -8.839312
#>  [30,] -1.64852069 -15.967286
#>  [31,] -1.61104316 -16.035173
#>  [32,] -4.73414595 -10.161157
#>  [33,] -3.75770089 -18.375639
#>  [34,] -6.25642453  -9.121743
#>  [35,] -5.25682246  -7.877132
#>  [36,] -5.93912089  -9.467616
#>  [37,] -4.33381438 -10.478112
#>  [38,] -2.35659164 -19.243410
#>  [39,]  2.32555631  -2.603642
#>  [40,]  2.32529624  -2.604949
#>  [41,] -8.79902664 -15.972933
#>  [42,]  4.09680036 -19.437427
#>  [43,] -3.97166715 -14.769294
#>  [44,] -5.44896749 -12.983928
#>  [45,] -5.69132362 -16.310864
#>  [46,]  2.63022031 -19.339559
#>  [47,]  1.24257391 -19.345449
#>  [48,] -3.05425292 -15.877069
#>  [49,] -3.40616616 -11.251594
#>  [50,] -4.51224603 -22.472634
#>  [51,] -5.74650482  -8.759965
#>  [52,] -5.74142125 -21.024695
#>  [53,] -5.68222734 -22.448863
#>  [54,] -6.40278029 -18.773218
#>  [55,] -8.10944434 -16.050188
#>  [56,] -4.61137108 -13.832931
#>  [57,] -6.55183336 -11.458356
#>  [58,] -4.60966241 -16.443896
#>  [59,]  3.88180910 -18.659281
#>  [60,] -4.74534228 -11.916155
#>  [61,] -6.02941693 -10.186810
#>  [62,] -4.58029687 -17.315900
#>  [63,] -5.09213360 -21.819865
#>  [64,]  0.75955290 -18.831250
#>  [65,] -6.42385298 -14.265248
#>  [66,] -6.32845661 -16.396996
#>  [67,]  1.20158764 -18.975777
#>  [68,] -5.05033889 -11.828210
#>  [69,] -6.39584072 -16.743191
#>  [70,] -4.72867266 -22.451533
#>  [71,] -4.25752373 -21.949554
#>  [72,]  3.25183230 -19.323963
#>  [73,] -4.93930012 -14.849980
#>  [74,]  3.75882295 -19.448826
#>  [75,] -5.16054325 -17.660998
#>  [76,] -5.47929717 -17.446213
#>  [77,] -6.70663684 -15.919409
#>  [78,] -6.08029840 -16.122960
#>  [79,] -6.25641463 -12.659702
#>  [80,] -5.98121271 -18.150946
#>  [81,] -5.61614968 -11.170241
#>  [82,] -5.60178851 -18.879910
#>  [83,] -6.20371782 -15.442153
#>  [84,] -2.49122518 -12.632536
#>  [85,] -6.28501956 -12.475014
#>  [86,] -0.21237170 -18.956493
#>  [87,] -5.91623476 -22.768658
#>  [88,] -6.02359865 -18.254761
#>  [89,]  1.41890147 -18.545347
#>  [90,] -1.51972426 -17.559261
#>  [91,] -5.32123071 -14.971276
#>  [92,] -5.37162661 -21.073610
#>  [93,] -4.09705068 -21.988633
#>  [94,] -5.74480072 -18.938054
#>  [95,] -4.88570905 -17.793504
#>  [96,] -4.99414535 -18.513430
#>  [97,] -6.83182020 -17.101413
#>  [98,] -5.89026559 -14.111029
#>  [99,] -3.91454588 -15.576334
#> [100,] -5.52067449 -21.860965
#> [101,] -6.65644640 -10.408689
#> [102,] -2.53364750 -15.300491
#> [103,] -0.03699264 -19.860098
#> [104,]  2.08012299 -18.945190
#> [105,] -4.45404342 -21.764321
#> [106,] -5.61674115 -19.133273
#> [107,] -6.23163493 -12.879730
#> [108,] -5.88494121  -9.697062
#> [109,] -3.59408922 -15.272087
#> [110,] -5.26517398 -20.921737
#> [111,] -5.58283320 -18.366651
#> [112,] -5.43938876  -8.854396
#> [113,] -5.67648378 -11.178114
#> [114,] -6.62791823 -14.709238
#> [115,] -1.78494732 -12.973720
#> [116,] -5.90930166 -17.129973
#> [117,] -1.35039369 -17.583415
#> [118,]  2.17512316 -19.712052
#> [119,] -6.06594630 -22.120456
#> [120,] -2.62806847 -12.766994
#> [121,] -6.19383247 -17.504033
#> [122,] -1.77931629 -12.908749
#> [123,] -4.20452890 -16.364333
#> [124,]  3.81122198 -19.156231
#> [125,]  3.45901865 -17.704209
#> [126,] -2.55395067 -14.707027
#> [127,]  4.32661428   9.302460
#> [128,]  3.55125411  12.378067
#> [129,]  0.89395106  10.371639
#> [130,]  0.89838297  21.143943
#> [131,]  2.76065573  11.907494
#> [132,]  1.67930440   8.832745
#> [133,] -0.38270469  21.348971
#> [134,]  0.85863540  16.740766
#> [135,]  8.49863627  13.618231
#> [136,] -1.04938880   8.375827
#> [137,] -1.07547859  11.398509
#> [138,]  1.70429350  19.967061
#> [139,]  2.01051415  15.321676
#> [140,] 10.66998053  13.910212
#> [141,]  7.75956216  15.645164
#> [142,]  4.70053431  13.179350
#> [143,] -0.65277375   7.464101
#> [144,]  0.27059489  13.818684
#> [145,] -0.36380370  12.021292
#> [146,] 10.64072567  14.394440
#> [147,] -0.69108196  14.548271
#> [148,] -1.11659119  14.756877
#> [149,]  5.58335271  15.884451
#> [150,]  4.26975023  14.558698
#> [151,]  1.86792239   8.934656
#> [152,]  0.27307075  10.586114
#> [153,]  4.08104284  11.142928
#> [154,] -2.32233591   7.348246
#> [155,] -0.69190002  16.473941
#> [156,]  0.60418060  11.211301
#> [157,]  4.22374359  10.717197
#> [158,]  2.37490967  16.795448
#> [159,]  1.96482540   9.802854
#> [160,]  2.11553996  19.087433
#> [161,]  7.80342701  11.843250
#> [162,]  0.94118026  19.716956
#> [163,]  1.95487302  19.692072
#> [164,]  1.32505351  20.314299
#> [165,] 10.18185933  12.997851
#> [166,]  5.21661097  10.664394
#> [167,]  3.98704252  18.071398
#> [168,]  3.14445111  13.698052
#> [169,] 10.91441050  13.564926
#> [170,]  0.88159907   9.612618
#> [171,]  5.84281816   7.880112
#> [172,]  1.68124486  13.953424
#> [173,]  3.34195603   9.816130
#> [174,]  2.24478832  10.620239
#> [175,] 10.43354508  13.437894
#> [176,]  0.47901253  20.909273
#> [177,]  2.18624895  15.971381
#> [178,]  1.90940467  13.223076
#> [179,]  2.67198192  14.407414
#> [180,]  6.66371618   9.665008
#> [181,]  6.32595704   9.406645
#> [182,]  1.30095632   9.166236
#> [183,]  1.96923776   8.797758
#> [184,] 11.72944842  13.378054
#> [185,]  4.89410761  12.169414
#> [186,] -2.46775175   6.507304
#> [187,]  2.40703540   9.763486
#> [188,] -0.63472642   6.898632
#> [189,]  6.29949186   7.573878
#> [190,] -0.89914612   8.972461
#> [191,]  5.58473180  14.112808
#> [192,]  2.35743619  17.176222
#> [193,]  7.91048565  14.334874
#> [194,] -0.77330448  10.939591
#> [195,]  1.18694517  21.625419
#> [196,]  7.06793204  17.185447
#> [197,]  1.87949483  20.631623
#> [198,] -1.54477466   6.743740
#> [199,] -0.71313701  10.398613
#> [200,]  2.65704000   8.511717
#> [201,] -1.95943653   6.538088
#> [202,]  1.04924563   6.605924
#> [203,]  3.14702347  15.344417
#> [204,]  2.19997654   7.944787
#> [205,] -0.78034452  13.865418
#> [206,]  3.59341154  13.837347
#> [207,] -1.57383301   8.485237
#> [208,]  0.80042346  15.835192
#> [209,]  1.15669460  17.341306
#> [210,] -0.49428222  15.261781
#> [211,]  0.23122105  12.068281
#> [212,]  6.94567859   9.476660
#> [213,]  9.85515470  13.503091
#> [214,]  0.30208147  17.582696
#> [215,]  0.63962140  20.364197
#> [216,]  0.51947442  19.806204
#> [217,]  0.18889809  13.043415
#> [218,] -2.10210489   7.761172
#> [219,]  7.27033843   9.461845
#> [220,]  4.95401765  12.393194
#> [221,]  2.50690338  20.426533
#> [222,]  2.49125987  16.206675
#> [223,] -0.28047052  20.590517
#> [224,]  2.37018949  14.300859
#> [225,]  9.62839027  13.667170
#> [226,]  2.54687115  18.882703
#> [227,]  3.98973577  18.304306
#> [228,] 10.63814466  13.760826
#> [229,] -0.94208912   8.445567
#> [230,]  0.34124234  20.925540
#> [231,] -0.15524476   7.714580
#> [232,] -1.90095661   7.663169
#> [233,]  1.87419706  12.291085
#> [234,]  6.28800151  18.131116
#> [235,]  2.94324543  18.844392
#> [236,]  0.45689256  19.878165
#> [237,]  0.10156675  21.323081
#> [238,]  7.35623934  16.939951
#> [239,] -0.01235342  18.663653
#> [240,]  0.02047117  11.355523
#> [241,]  2.39359094   5.443848
#> [242,]  8.96982003  13.362543
#> [243,]  0.07357717  17.257444
#> [244,] -1.92305115   7.295954
#> [245,]  0.99187797  15.380983
#> [246,]  5.72181134   7.899151
#> [247,]  6.43836206  12.247776
#> [248,]  6.21097842   7.901787
#> [249,]  1.71325100  14.824886
#> [250,]  3.44211016  10.410924
#> [251,]  4.99720863  15.778052
#> [252,]  9.19307982  13.659862
#> [253,]  6.12098466  11.312892
#> [254,] -0.14925350  16.026346
#> [255,] -0.24790121  16.203304
#> [256,] -1.03436781   9.563012
#> [257,]  6.69841934  14.245501
#> [258,] -1.59981716  16.089695
#> [259,]  1.74210645   8.199994
#> [260,]  4.76469331  18.144114
#> [261,] 11.37923260  13.467096
#> [262,]  4.87127506  10.649697
#> [263,] 11.44626559  13.216537
#> [264,]  4.50848503  13.089912
#> [265,] 11.95392327  13.385864
#> [266,]  0.09724124  16.734376
#> [267,]  4.69809783  11.392126
#> [268,]  7.87345311  15.090532
#> [269,] -1.95102846   5.973187
#> [270,]  1.43597458  14.252112
#> [271,] -0.43944230  11.286921
#> [272,] -0.18465652   8.987304
#> [273,]  2.09843114  19.312023
#> [274,]  3.81153105  15.607996
#> [275,]  0.74531172   8.121830
#> [276,]  0.35260939   7.947705
#> [277,]  4.15086206  10.030675
#> [278,]  1.82589937  10.477651
#> [279,] -1.40151200  11.246381
#> [280,] -2.36801966   6.691952
#> 
#> $costs
#>   [1]  1.268470e-03  5.616187e-04  1.280911e-03  1.543918e-03  5.297053e-04
#>   [6]  1.264030e-03  2.212517e-05 -5.070735e-05 -2.787540e-05  1.472186e-04
#>  [11]  2.618372e-03  3.562176e-04  1.427522e-03 -3.948765e-05  1.989866e-03
#>  [16]  6.078647e-04  1.671381e-04  1.414691e-03  1.518803e-03  1.191712e-03
#>  [21]  1.316990e-03  9.211599e-04  1.806888e-04  2.143729e-03  7.777222e-04
#>  [26]  6.356404e-04  5.844939e-04  8.549787e-04  5.088371e-05  9.403421e-04
#>  [31] -4.307464e-05  4.518142e-04  9.907933e-04  4.139949e-04  3.449080e-03
#>  [36]  3.331611e-04  3.442969e-04  5.550483e-04  4.019635e-03  4.128927e-03
#>  [41]  1.917445e-03  7.488208e-04 -6.161200e-04  8.943485e-04  4.770052e-04
#>  [46]  5.072563e-04  8.450649e-04  1.349513e-03  9.747629e-04  2.786994e-04
#>  [51]  7.435445e-04  1.036477e-03 -5.919668e-05  6.850019e-04  1.319361e-03
#>  [56]  2.235547e-03  8.137100e-04  1.462533e-03  5.274244e-04  3.779713e-04
#>  [61]  1.026156e-04  9.523088e-04 -3.892156e-05  1.311990e-03  1.533330e-03
#>  [66]  1.456337e-03  8.240417e-04  1.330233e-03  5.657046e-04  6.297338e-05
#>  [71]  5.158202e-05  2.907818e-04  9.636466e-04  2.068097e-04  3.385838e-04
#>  [76]  1.727137e-04  5.573801e-04 -5.150117e-04  1.208687e-03  1.765724e-04
#>  [81]  9.037489e-04  4.600203e-04  1.430012e-03  3.026954e-04  7.055880e-04
#>  [86]  2.151177e-03 -5.657831e-05  3.255948e-04  4.122651e-04  2.596556e-03
#>  [91] -1.710423e-04  9.304122e-04  8.666966e-04  6.828857e-04  1.313820e-03
#>  [96]  7.728343e-04  2.211745e-04  1.645620e-03  1.513019e-03 -7.067259e-05
#> [101]  1.136322e-04 -2.454287e-04  2.373120e-03  5.809140e-04  7.127932e-04
#> [106]  1.719441e-04  8.710232e-04  3.531259e-04  1.510294e-03 -1.488270e-05
#> [111]  3.572561e-04  6.338157e-04  6.450006e-04  2.188081e-04  1.651620e-03
#> [116]  3.118557e-04  1.985967e-03  3.544158e-04  1.654024e-04  7.515073e-04
#> [121]  5.327760e-04  1.057330e-03  3.820033e-04  1.162463e-04  1.727576e-03
#> [126]  1.090915e-03  1.366486e-03  1.274675e-03  3.041580e-04  2.947353e-05
#> [131]  8.029760e-04  6.765267e-04  2.060047e-05  1.223667e-03  8.391803e-04
#> [136]  8.452170e-04  1.138363e-03  2.533533e-05  3.338094e-03  3.599084e-04
#> [141]  2.598799e-03  8.832708e-04  6.707527e-04  6.957703e-04  1.453426e-03
#> [146]  8.259180e-04  1.334646e-03  9.345302e-04  2.151229e-04  3.760175e-03
#> [151]  7.510244e-04  1.945072e-03  1.723623e-03  5.553067e-06  6.043446e-04
#> [156]  1.952236e-04  1.344759e-03  1.779758e-03  1.147047e-03  9.838382e-04
#> [161]  2.642942e-03  8.096634e-04  7.122868e-04  1.520361e-04  1.571532e-04
#> [166]  6.441100e-04  2.052647e-03  3.333838e-03  2.501761e-04  5.349627e-04
#> [171]  1.286585e-03  2.369820e-03  1.207498e-03  1.198606e-03  2.810520e-04
#> [176] -3.390501e-04  7.723888e-04  2.651455e-03  1.247522e-03  3.887590e-03
#> [181]  1.202196e-03  1.713496e-03  1.475077e-03  6.914219e-04  3.580793e-04
#> [186]  5.764043e-05  5.974948e-04  4.653095e-04  8.850869e-04  1.459536e-03
#> [191]  7.605895e-04  1.559027e-03  2.986817e-03  1.494508e-03  7.002981e-05
#> [196]  2.170326e-03  6.020494e-04  2.284491e-04  1.564666e-03 -3.839084e-05
#> [201]  1.158894e-04  3.562691e-03  1.675291e-03  7.268457e-04  2.230405e-03
#> [206]  2.606000e-03  6.548716e-04 -6.398448e-04  5.071388e-04  1.114841e-03
#> [211]  2.186676e-03  4.001644e-03  7.219480e-04  1.282926e-03  5.356154e-05
#> [216]  9.442128e-05  1.628873e-03  1.212121e-04  9.514490e-04  6.455522e-04
#> [221]  1.218176e-06  1.451463e-03  3.001422e-04  2.049226e-03  6.679158e-04
#> [226]  1.526321e-03  1.933960e-03  2.514584e-04  6.756935e-04 -1.164116e-04
#> [231]  1.124682e-03  6.317509e-05  3.064103e-03  1.989443e-03  4.122464e-04
#> [236] -2.024155e-04 -3.825223e-05  1.007911e-03  1.328614e-03  1.023608e-03
#> [241]  3.846256e-03  1.858672e-03  6.525372e-04  6.171062e-05  7.846191e-04
#> [246]  1.559735e-03  1.412168e-03  1.830645e-03  1.822230e-03  1.300523e-03
#> [251]  2.072767e-03  1.447324e-03  1.041612e-03  1.325850e-03  7.691127e-04
#> [256]  1.275887e-03  4.685111e-03  1.619327e-03  1.078641e-03  9.539217e-04
#> [261]  4.823906e-04  1.078623e-03  2.420004e-04  2.334402e-03  9.540422e-04
#> [266]  1.146610e-03  1.891390e-03  1.297935e-03  1.758168e-04  1.673831e-03
#> [271]  1.482305e-03  9.084798e-04  9.889593e-04  9.792318e-04  8.845710e-04
#> [276]  3.752094e-04  1.533062e-03  1.337785e-03  6.731742e-04  3.201092e-05
#> 
#> $itercosts
#>  [1] 49.0381974 48.1091295 47.9682004 47.8326895 47.8612154  0.3607616
#>  [7]  0.3122722  0.2964914  0.2918155  0.2897631  0.2859687  0.2844417
#> [13]  0.2850085  0.2839697  0.2838602  0.2833419  0.2821053  0.2816678
#> [19]  0.2819076  0.2811398
#> 
#> $origD
#> [1] 50
#> 
#> $perplexity
#> [1] 30
#> 
#> $theta
#> [1] 0.5
#> 
#> $max_iter
#> [1] 1000
#> 
#> $stop_lying_iter
#> [1] 250
#> 
#> $mom_switch_iter
#> [1] 250
#> 
#> $momentum
#> [1] 0.5
#> 
#> $final_momentum
#> [1] 0.8
#> 
#> $eta
#> [1] 200
#> 
#> $exaggeration_factor
#> [1] 12
#> 
#> attr(,"class")
#> [1] "Rtsne" "list" 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>           Df SumOfSqs      R2      F Pr(>F)    
#> Model      2   30.538 0.53153 157.14  0.001 ***
#> Residual 277   26.915 0.46847                  
#> Total    279   57.453 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.1.2.1.1.3 NMDS
beta_dispersion(enterotype, dist = "bray", method = "NMDS", group = "SeqTech",
                color_vector = c("#777711", "#117777", "#DD7788"),
                legend_title = "Sequencing tech without boxplots", lwd = 2, 
                font = 2, draw = "polygon", text = T, permanova = T, 
                y.intersp = 0.7,
                boxplot = F, where = "bottom")
#> Run 0 stress 0.1380077 
#> Run 1 stress 0.1562307 
#> Run 2 stress 0.1515837 
#> Run 3 stress 0.160802 
#> Run 4 stress 0.155239 
#> Run 5 stress 0.1498397 
#> Run 6 stress 0.1523578 
#> Run 7 stress 0.1557621 
#> Run 8 stress 0.1567377 
#> Run 9 stress 0.1579532 
#> Run 10 stress 0.1590286 
#> Run 11 stress 0.1564627 
#> Run 12 stress 0.1500833 
#> Run 13 stress 0.1588504 
#> Run 14 stress 0.1521323 
#> Run 15 stress 0.1605488 
#> Run 16 stress 0.1589632 
#> Run 17 stress 0.1525489 
#> Run 18 stress 0.1563119 
#> Run 19 stress 0.1439898 
#> Run 20 stress 0.1514314 
#> *** Best solution was not repeated -- monoMDS stopping criteria:
#>     14: stress ratio > sratmax
#>      6: scale factor of the gradient < sfgrmin

#> $reduction
#> 
#> Call:
#> metaMDS(comm = otu) 
#> 
#> global Multidimensional Scaling using monoMDS
#> 
#> Data:     otu 
#> Distance: bray 
#> 
#> Dimensions: 2 
#> Stress:     0.1380077 
#> Stress type 1, weak ties
#> Best solution was not repeated after 20 tries
#> The best solution was from try 0 (metric scaling or null solution)
#> Scaling: centring, PC rotation, halfchange scaling 
#> Species: expanded scores based on 'otu' 
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>           Df SumOfSqs      R2      F Pr(>F)    
#> Model      2   30.538 0.53153 157.14  0.001 ***
#> Residual 277   26.915 0.46847                  
#> Total    279   57.453 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.1.2.1.1.4 DCA
beta_dispersion(enterotype, dist = "bray", method = "DCA", group = "SeqTech",
                color_vector = c("#777711", "#117777", "#DD7788"),
                legend_title = "Sequencing tech without boxplots", lwd = 2, 
                font = 2, draw = "polygon", text = T, permanova = T, 
                y.intersp = 0.7,
                boxplot = F, where = "bottom")

#> $reduction
#> 
#> Call:
#> decorana(veg = as(otu_table(reverseASV(physeq)), "matrix")) 
#> 
#> Detrended correspondence analysis with 26 segments.
#> Rescaling of axes with 4 iterations.
#> Total inertia (scaled Chi-square): 3.8641 
#> 
#>                        DCA1   DCA2   DCA3   DCA4
#> Eigenvalues          0.5549 0.2483 0.2753 0.1670
#> Additive Eigenvalues 0.5549 0.2133 0.2613 0.1344
#> Decorana values      0.5997 0.2398 0.1972 0.1521
#> Axis lengths         3.7145 2.4516 3.2989 2.2386
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>           Df SumOfSqs      R2      F Pr(>F)    
#> Model      2   30.538 0.53153 157.14  0.001 ***
#> Residual 277   26.915 0.46847                  
#> Total    279   57.453 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.1.2.1.1.5 PCA
beta_dispersion(enterotype, dist = "bray", method = "PCA", group = "SeqTech",
                color_vector = c("#777711", "#117777", "#DD7788"),
                legend_title = "Sequencing tech without boxplots", lwd = 2, 
                font = 2, draw = "polygon", text = T, permanova = T, 
                y.intersp = 0.7,
                boxplot = F, where = "bottom")
#> Warning in plot.window(...): "permanova" n'est pas un paramètre graphique
#> Warning in plot.window(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy, type, ...): "permanova" n'est pas un paramètre graphique
#> Warning in plot.xy(xy, type, ...): "boxplot" n'est pas un paramètre graphique
#> Warning in title(...): "permanova" n'est pas un paramètre graphique
#> Warning in title(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy.coords(x, y), type = type, ...): "permanova" n'est pas un
#> paramètre graphique
#> Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxplot" n'est pas un
#> paramètre graphique

#> $reduction
#> 
#> Call: rda(X = otu, scale = T)
#> 
#>               Inertia Rank
#> Total             553     
#> Unconstrained     553  270
#> 
#> Inertia is correlations
#> 
#> Eigenvalues for unconstrained axes:
#>   PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8 
#> 332.5  14.4  13.3   8.7   7.2   5.5   4.8   4.6 
#> (Showing 8 of 270 unconstrained eigenvalues)
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>           Df SumOfSqs      R2      F Pr(>F)    
#> Model      2   30.538 0.53153 157.14  0.001 ***
#> Residual 277   26.915 0.46847                  
#> Total    279   57.453 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.1.2.1.1.6 CA
beta_dispersion(enterotype, dist = "bray", method = "CA", group = "SeqTech",
                color_vector = c("#777711", "#117777", "#DD7788"),
                legend_title = "Sequencing tech without boxplots", lwd = 2, 
                font = 2, draw = "polygon", text = T, permanova = T, 
                y.intersp = 0.7,
                boxplot = F, where = "bottom")

#> $reduction
#> 
#> Call: cca(X = as(otu_table(reverseASV(physeq)), "matrix"))
#> 
#>               Inertia Rank
#> Total           3.864     
#> Unconstrained   3.864  263
#> 
#> Inertia is scaled Chi-square
#> 
#> Eigenvalues for unconstrained axes:
#>    CA1    CA2    CA3    CA4    CA5    CA6    CA7    CA8 
#> 0.5997 0.4023 0.2811 0.2569 0.2372 0.2301 0.1737 0.1567 
#> (Showing 8 of 263 unconstrained eigenvalues)
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>           Df SumOfSqs      R2      F Pr(>F)    
#> Model      2   30.538 0.53153 157.14  0.001 ***
#> Residual 277   26.915 0.46847                  
#> Total    279   57.453 1.00000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.1.2.1.2 Constrained models

Constrained analysis are multiple regression of the unconstrained approaches. They allow to fit the ecological data or sample data to the community data. Alternatively, you can fit other omic data to your beta diversity analysis. We have two different classes of constrained analysis :

  • rda is a regression of a PCA
  • cca is a regression of a CA
  • dbRDA is a RDA based on a dissimilarity matrix
    The function will return a plot, same as beta_dispersion, and also the result of the constrained analysis.

Vegan authors have implemented a very nice feature if you want to make your own representation instead of using mine. You can set use scores(res, tidy=T) to return a dataframe compatible with ggplot2.

3.1.2.1.2.1 CCA
res= constrained_beta_dispersion(enterotype,
                            model= "SeqTech+Gender+Nationality+Age+ClinicalStatus",
                            group="Enterotype", method = "CCA",
                            boxplot =T, 
                            text=T,
                            color_vector = tol21rainbow)
#> 
#> Some constraints or conditions were aliased because they were redundant. This
#> can happen if terms are constant or linearly dependent (collinear):
#> 'SeqTech.ThisVarHasOnly1Level', 'ClinicalStatuselderly'
#> Warning in plot.window(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy, type, ...): "boxplot" n'est pas un paramètre graphique
#> Warning in title(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxplot" n'est pas un
#> paramètre graphique

3.1.2.1.2.2 RDA
res= constrained_beta_dispersion(enterotype,
                            model= "SeqTech+Gender+Nationality+Age+ClinicalStatus",
                            group="Enterotype", method = "RDA",
                            boxplot =T, 
                            text=T,
                            color_vector = tol21rainbow)
#> 
#> Some constraints or conditions were aliased because they were redundant. This
#> can happen if terms are constant or linearly dependent (collinear):
#> 'SeqTech.ThisVarHasOnly1Level', 'ClinicalStatuselderly'
#> Warning in plot.window(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy, type, ...): "boxplot" n'est pas un paramètre graphique
#> Warning in title(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxplot" n'est pas un
#> paramètre graphique

3.1.2.1.2.3 dbRDA
res= constrained_beta_dispersion(enterotype,
                            model= "SeqTech+Gender+Nationality+Age+ClinicalStatus",
                            group="Enterotype", method = "dbRDA",
                            boxplot =T, 
                            text=T,
                            color_vector = tol21rainbow)
#> 
#> Some constraints or conditions were aliased because they were redundant. This
#> can happen if terms are constant or linearly dependent (collinear):
#> 'SeqTech.ThisVarHasOnly1Level', 'ClinicalStatuselderly'
#> Warning in plot.window(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy, type, ...): "boxplot" n'est pas un paramètre graphique
#> Warning in title(...): "boxplot" n'est pas un paramètre graphique
#> Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxplot" n'est pas un
#> paramètre graphique

You can then use anova statistic test to decipher which factor or feature is significantly associated with your beta diversity

anova(res, by="terms")
#> Permutation test for capscale under reduced model
#> Terms added sequentially (first to last)
#> Permutation: free
#> Number of permutations: 999
#> 
#> Model: capscale(formula = as(otu_table(reverseASV(physeq)), "matrix") ~ SeqTech + Gender + Nationality + Age + ClinicalStatus, data = df, distance = dist, na.action = na.exclude)
#>                Df SumOfSqs      F Pr(>F)   
#> Gender          1  0.06650 0.7390  0.637   
#> Nationality     5  1.06616 2.3695  0.006 **
#> Age             1  0.43786 4.8657  0.002 **
#> ClinicalStatus  3  0.30763 1.1395  0.277   
#> Residual       26  2.33975                 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.3 Heatmap based on the phyloseq object

::: {style=“text-align: justify”} This function will perform a top taxa at a rank of your choosing and create a heatmap with annotations. For now only one annotation is supported.
The clustering is made using the hclustfunction and a Ward.D2 method. You can define the distance matrix that you want to use. It is important to note that the distance is made before any trimming of the data. This means that the distance matrix is made at the ASV/OTU level before doing the rank merging and topping, so the clustering will represent your “true” data instead of a modified dataset. In my personnal opinion, this is the only way of performing clustering or any type of generalized analysis : clustering, reducing or else on the ASV/OTU/MAG/… level and then aggregates for plotting.

If you really want to cluster on other taxonomic level you should use tax_glom before using this function.

phylo_heatmap(enterotype, top = 30, labels = "SeqTech", taxa_rank = "Genus", factor_to_plot ="Enterotype", split = 3, distance = "bray" )
#> [1] "No phylogenetic tree in this phyloseq object, bray-curtis distance selected."
#> [1] " not reversed"

3.2

3.3 Usage of various functions

The idea behind these functions is : creating a more automatic pipeline enabling filtering and subseting of the phyloseq object without having to perform a temporary phyloseq object and a temporary distance object. With these function you can directly use the “pipe” introduced by magrittr.
So you can use a subset_samples like the following and automatically plot your beta diversity without adding too much code.

enterotype %>%
  subset_samples(SeqTech=="Illumina") %>% 
  beta_dispersion(group="Enterotype", color_vector = c("#777711", "#117777", "#DD7788"), 
                  legend_title = "Enterotypes \n with bray" ,
                  lwd = 2, font = 2, draw = "lines", text = T, permanova = T, y.intersp = 0.7, 
                  where="bottomleft", cex = 3)

#> $reduction
#> Call: vegan::wcmdscale(d = d, eig = T)
#> 
#>           Inertia Rank
#> Total      2.9338     
#> Real       3.4327   41
#> Imaginary -0.4989   43
#> 
#> Results have 85 points, 41 axes
#> 
#> Eigenvalues:
#>  [1]  1.0925  0.9475  0.2273  0.1685  0.1381  0.1072  0.0971  0.0749  0.0671
#> [10]  0.0625  0.0499  0.0485  0.0429  0.0370  0.0292  0.0266  0.0244  0.0226
#> [19]  0.0208  0.0183  0.0180  0.0159  0.0123  0.0116  0.0103  0.0089  0.0078
#> [28]  0.0073  0.0068  0.0058  0.0054  0.0043  0.0039  0.0031  0.0024  0.0021
#> [37]  0.0016  0.0008  0.0007  0.0006  0.0003 -0.0005 -0.0007 -0.0010 -0.0012
#> [46] -0.0014 -0.0016 -0.0019 -0.0023 -0.0027 -0.0031 -0.0032 -0.0035 -0.0037
#> [55] -0.0038 -0.0042 -0.0045 -0.0049 -0.0053 -0.0055 -0.0061 -0.0063 -0.0067
#> [64] -0.0075 -0.0081 -0.0089 -0.0094 -0.0097 -0.0105 -0.0106 -0.0121 -0.0122
#> [73] -0.0134 -0.0152 -0.0158 -0.0165 -0.0188 -0.0205 -0.0221 -0.0260 -0.0292
#> [82] -0.0362 -0.0494 -0.0727
#> 
#> Weights: Constant
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(paste0(" as(otu_table(reverseASV(physeq)), 'matrix') ~", group)), data = as(sample_data(physeq), "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>          Df SumOfSqs     R2      F Pr(>F)    
#> Model     2   1.0638 0.3626 23.324  0.001 ***
#> Residual 82   1.8700 0.6374                  
#> Total    84   2.9337 1.0000                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Additionally, if you want to go further you can also serialized the code with a for loop or a lapply or even a parallel approach using mclapply or doParallel.

layout(matrix(c(1,2,3),
                nrow = 1,
                ncol = 3,
                byrow = TRUE))
for(i in c("Illumina", "Sanger", "Pyro454")){
  enterotype %>%
    subset_samples(SeqTech==i & !is.na(Enterotype))%>%
    beta_diversity(dist="bray", method="PCoA", group="Enterotype", 
                   color_vector =c("#777711", "#117777", "#DD7788"), 
                   factor_to_plot = paste("Enterotypes using \n",i, "sequencing"), lwd = 2, cpoint = 2)
}

3.4 Differential abundance testing

Differential abundance testing is quite complicated because of the number of different statistical approaches existing in the literature. I personally use ALDEx2 and SIAMCAT a lot. The first because it is usually highly regarded and the second for the overall easiness of the package. For now differential_abundance implement only ALDEx2 approach.

The function will return two datasets (all taxa and one with only the significant taxa) and two plots representing the taxa significantly represented in one of the two conditions. This function only covers two by two analysis.

PROBLEM IN ALDEX2 CODE

data("GlobalPatterns")
tmp= GlobalPatterns %>%
  subset_samples(SampleType=="Feces" | SampleType=="Soil")%>%
  subset_taxa(Genus!="NA")%>%
  tax_glom("Genus")
taxa_names(tmp)= paste0(tax_table(tmp)[,'Genus'], 1:length(tax_table(tmp)[,"Genus"]))

res= tmp %>% 
  differential_abundance( group="SampleType", col1 = "brown", col2="darkgreen", plot = T)

3.4.1 Results

head(res$all_features)
#> NULL

3.4.2 Barplot results

res$barplot
#> NULL

3.4.3 Volcano plot results

res$volcano
#> NULL

TBD

3.5 Model testing

Last but not least : machine learning. Machine learning is used in general to predict outcome or predict class assignation between health and disease status. For this purpose we developed wrapper functions to screen models more easily. Most of the code is actually generated using caret [https://topepo.github.io/caret/] package.

For now the functions implement randomForest, glmnet and plsda models, other model might work but I didn’t test them yet.

3.5.1 Screen models on your raw data

This function accepts either objects or a list containing a matrix of taxa for example and a dataframe with the variables to test.

# res = screen_models(enterotype, model="glmnet", cores = 1, number = 3, repeats = 3)

3.6 Non phyloseq object oriented functions

I also created functions to make classical analysis on non phyloseq objects. A classical type of data are metabolomics or flow cytometry datasets. These datasets can easily make use of dimension reduction, analyse of variance, model testing.

3.6.1 Reduction of dimensions on non phyloseq objects

plot_reduction()

plot_constrained_reduction()

The first function is an equivalent of and the second one is an equivalent of . Make sure to use data.frame with samples as rows.

data(metabolomic)
metabolomic |> 
  plot_reduction(clinical_data = 1:4, group = "sex", method = "PCA", text = T, stat = "permanova", type = "arrows" )

#> $reduction
#> 
#> Call: rda(X = as(mat[, -clinical_data], "matrix"), scale = scale)
#> 
#>               Inertia Rank
#> Total             213     
#> Unconstrained     213   98
#> 
#> Inertia is correlations
#> 
#> Eigenvalues for unconstrained axes:
#>    PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8 
#> 27.941 14.341 12.712 11.406  9.874  9.110  7.954  6.072 
#> (Showing 8 of 98 unconstrained eigenvalues)
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = as.formula(as.formula(paste0("mat[, -clinical_data] ~", group))), data = as(mat, "data.frame"), permutations = 999, method = dist, na.action = na.exclude)
#>          Df SumOfSqs      R2      F Pr(>F)  
#> Model     1   0.0858 0.02647 2.6377  0.021 *
#> Residual 97   3.1567 0.97353                
#> Total    98   3.2425 1.00000                
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

metabolomic |> 
plot_constrained_reduction(clinical_data = 1:4, model="birth_type+breastfeeding+sex", group = "birth_type")

#> $model
#> 
#> Call: cca(formula = as(mat[, -clinical_data], "matrix") ~ birth_type +
#> breastfeeding + sex, data = mat[, clinical_data], na.action = na.exclude,
#> scale = scale)
#> 
#>               Inertia Proportion Rank
#> Total         1.52003    1.00000     
#> Constrained   0.05714    0.03759    3
#> Unconstrained 1.46289    0.96241   95
#> 
#> Inertia is scaled Chi-square
#> 
#> Eigenvalues for constrained axes:
#>     CCA1     CCA2     CCA3 
#> 0.029818 0.019580 0.007746 
#> 
#> Eigenvalues for unconstrained axes:
#>     CA1     CA2     CA3     CA4     CA5     CA6     CA7     CA8 
#> 0.20095 0.18072 0.12443 0.09841 0.07894 0.06957 0.06262 0.05224 
#> (Showing 8 of 95 unconstrained eigenvalues)
#> 
#> 
#> $stat
#> Permutation test for adonis under reduced model
#> Marginal effects of terms
#> Permutation: free
#> Number of permutations: 999
#> 
#> adonis2(formula = mod, data = as(mat, "data.frame"), permutations = 999, method = dist, by = by, na.action = na.exclude)
#>               Df SumOfSqs      R2      F Pr(>F)  
#> birth_type     1   0.1916 0.01719 1.7004  0.100 .
#> breastfeeding  1   0.0449 0.00403 0.3988  0.960  
#> sex            1   0.1972 0.01770 1.7502  0.079 .
#> Residual      95  10.7031 0.96063                
#> Total         98  11.1417 1.00000                
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.6.2 Analysis of variance

Here is an important part of any analysis : finding which factors explains the most variance. The principle is that we will get the variance for each features and the variance for each group. The group variance will be summed for the given factor and then we will use 1-variance.factor/variance.total, which will give us the variance for each factors.

On top of the variance, the function will calculate p.values using kurskal tests or wilcoxon test depending of the number of factors and a FDR correction. For now the function only supports non parametric tests.

The function can take quite some time if you have a lot of samples, features and factors, you can use multi-threading here.

variance = metabolomic |> 
  dplyr::select(!child_id) |> 
  calculate_variance(clinical_data = 1:3, cores = 1)
#> [1] "birth_type"
#> [1] "breastfeeding"
#> [1] "sex"
#> p valuye now
#> Using features as id variables
#> Using features as id variables
#> Using features as id variables

The function will produce a list of data.frame

4 IgA seq analysis pipeline

You’ve performed all your sortings and sequencing, you now have three samples coming from a single individual.
We created a pipeline analysis where you will use the neg1 (9/10 of the neg fraction) and neg2 (1/10 of the neg fraction) dispersion (centered and reduced) to create a normal dispersion, using a Z approach we will have a Z score for the pos/neg1 based on the standard deviation of neg1/neg2.

Here we can see what the analysis will look like:

  • The technical dispersion is assessed using neg1/neg2for the log2 ratio and the neg1*neg2abundance for log10 (black dot)

  • The biological dispersion is assessed pos/neg1 for the log2 ratio and the pos*neg1 abundance for log10 (orange dot)

We will then take windows of X ASV (for example 20) to create n Gaussian curve and n …
The pipeline is based on three main functions that will call for other functions : seq_table, slide_z and collapse_IgAseq.

4.1 Make the seq_table list

First the seq_table function will take your phyloseq object and transform it to a list of data frames, one data frame for each samples coming from a single individual. For this function you will need to give physeq, sample_name corresponding to the name identifying the individual from which the sorted samples came, sorting_names the column where we find the samples such as : “sample1_pos”, “sample1_neg1”, “sample1_neg2”. Then the cols_to_keep that need to stay for now on “all”, it will collapse your sample_data in one column to allow you get it back later on.

The function will tell you if there is some samples are alone, if you have duplicated samples you need to sort them out or the rest of the pipeline will block. You can take a look at the architecture of the new object, you will find the ASV sequence as rownames, the taxonomy collapsed with “#” separator, the three samples having their own column and the sample_data collapse using also “#” separator. The function will only take the ASV that are present in the samples.

data("igaseq")
igaseq = transform_sample_counts(igaseq, function(x)x/sum(x))
sample_names(igaseq)= sample_data(igaseq)$sample_sort

seq.tab= seq_table(igaseq, sample_name = "sample_origin", sorting_names = "sample_sort", cols_to_keep = "all" )

DT::datatable(seq.tab$MO101, rownames = F)

The colnames will have the sample_names as given in the otu_table(), check that they correspond to your pos, neg1 and neg2.

Run the main function Now we will run the main function : slide_z. This function will take you seq_table object and run the Z function for each samples. If you are running this function for the first time use plot=T, if you already made the plots let it as FALSE it will make the loop a lot faster.

4.2 Deal with the zero values

In this approach we will use log2 ratio and log10 of abundances. As you know log2(0/x) or log2(x/0) can’t be performed, so we need a way to deal with the zeros. We came up with two approach :

  • remove all ASV with a zero value

  • replace 0 by a random number between 0 and the min value found in one of the three samples

  • replace 0 by the minimum count in each samples, this probably the worst thing to do

In sample with few ASV, i.e meconiums, I strongly recommend using the random_generation while in adults samples the no_zero approach is performing better. For more complex samples the decision is up to you, you can use the function neg_dipsersion to visualize the two outcomes.

neg_dispersion(seq.tab[[6]], positive_sorted_sample = "pos", negative_sorted_sample = "neg1", second_negative_sample = "neg2", type = "superposed")

neg_dispersion(seq.tab[[6]], positive_sorted_sample = "pos", negative_sorted_sample = "neg1", second_negative_sample = "neg2",type = "facet all three")
# run the following if you want every samples, make sure to change the number of cores
# pdf("test.pdf", width = 10, height = 7)
# mclapply(seq, neg_dispersion, mc.cores = 6, type="superposed")
# dev.off()

4.3 Run the slide_z function

The first step will be to create log2 ratios and log10 abundance (log10 abundance of pos * neg1 for example) for each ASV between the pos and the neg1 (log_2_ratio - log10_abundance) and between the neg1 and neg2 (log2_neg_ratio - log10_neg_abundance). This will be done by the function log_ratio called by the slide_z function.
The function will also create an ellipse of confidence interval of your choosing, default being confidence_interval=c(0.95,0.99, 0.999), this is done by the function ellipse_me also called by the `slide_z` function.

The output will be a list of S4 objects containing the following slots :

  • ig_seq_all : containing all the samples and all the ASV

  • ig_up : containing all the samples and all the ASV that significantly enriched in the IgA positive fraction

  • ig_down : containing all the samples and all the ASV that are significantly enriched in the IgA negative fraction

In each slot you will find the following columns :

  • taxonomy

  • sample_id

  • new : the sample_data collapsed

  • pos : positive fraction abundance

  • neg1 : negative (9/10) fraction abundance

  • neg2 : negative (1/10) fraction abundance

  • log10_abundance = log10(pos * neg1)

  • log2_ratio = log2(pos / neg1)

  • log10_neg_abundance = log10(neg1 * neg2)

  • log2_neg_ratio = log2(neg1 / neg2)

  • taxa = tax_table collapsed

  • SlideNorm = the normalized dispersion for each ASV

  • score = IgAseq score

  • ellipse_level = the level of confidence

IgA_seq=list()

system.time(for(i in names(seq.tab)){
  print(i)
  IgA_seq[[i]]= slide_z(seq.tab[[i]], positive_sorted_sample = "pos", negative_sorted_sample = "neg1", second_negative_sample = "neg2", deltaX = 30, slide_version = "slide_z_modern", alpha = 0.05, plot = F, zero_treatment = "random generation") 
})

4.4 Collapse the list of IgAseq into a single S4 object

We now need to collapse the list of IgA_seq objects into a single one, just use collapse_IgAseq and separate the columns that were pasted together.

IgA_seq = collapse_IgAseq(IgA_seq)
# DT:: datatable(IgA_seq@ig_seq_all, rownames = F)
# View(IgA_seq)


IgA_all= IgA_seq@ig_seq_all  %>% 
  separate(taxonomy, into=c("Reign","Phylum", "Order","Class","Family", "Genus","Species","ASV", "rest"), sep="#") %>% 
  separate(col = new, into=colnames(sample_data(igaseq)),sep= "#" )

IgA_all$alpha= ifelse(IgA_all$score>1.96, "positively significative", ifelse(IgA_all$score< -1.96, "negatively significative", "not significative"))

dim(IgA_all)

4.5 Plot like Gordon IgA seq

For the example we will plot the IgAseq data like Gordon’s paper. We first make a wilcoxon test to decipher which genera are significantly different from zero. Then we plot as balloon plot using ggplot.

library(rstatix)
tmp= IgA_all %>% 
  group_by(Genus, donor)%>% 
   mutate(n=n())%>%
  filter(n>5 , Genus!="NA")%>%
  wilcox_test(score~1, mu=0, alternative ="two.sided", detailed = T)

tmp2= IgA_all %>%
  group_by(Genus, donor)%>%
   mutate(n=n())%>%
  filter(n>5, Genus!="NA")%>%
  dplyr::select(Genus, score, donor)%>%
    mutate(mean_score= median(score))%>%
  left_join(tmp)

  tmp2$mean_score[tmp2$mean_score <= (-5)]= (-5)
  alpha= ifelse(tmp2$p<0.05, -log10(tmp2$p), 0.5)
  
  tmp2$donor= factor(tmp2$donor, levels = c("child_delivery","child_2_months","child_24_months", "mother_24_months"))
  
tmp2= tmp2%>%
    ungroup()%>%
     select(p, score, mean_score, Genus, donor)%>%
    mutate(alpha= ifelse(tmp2$p<0.05, -log10(tmp2$p), 0),
      alpha2= ifelse(tmp2$mean_score<0, -alpha, alpha),
      mean_score2= rescale(c(abs(mean_score)), to=c(0,5)))
   
 tmp2 %>%
   ggplot(aes(0, Genus,  fill=alpha2, size=mean_score2)) +
      geom_point(shape=21) +
       scale_fill_gradient2(low = "olivedrab4", mid = "white", high = "brown", midpoint = 0, guide=F)+
      scale_size_continuous(breaks=c(0,3,3,4,4,5,5),labels=c(-5,-4,-3,0,3,4,5),range = c(0,5))+
      guides(size = guide_legend( override.aes = list(fill =colorRampPalette(c("darkgreen", "white", "brown"))(7), 
                                                      size=c(5,4,3,1,3,4,5)), nrow=1,
                                 direction = "horizontal", 
                                 title.position = "top", 
                                 label.position = "bottom", 
                                 label.hjust = 0.5, 
                                 label.vjust = 1))+
      labs(fill="", x="Age", size="IgAseq median score")+
    facet_grid(Phylum~donor, scales="free", space = "free", labeller = labeller(donor=fac))+
         theme(axis.text.y = element_text(size=8, face="bold"),
         axis.text.x=element_blank(),
         axis.ticks.x = element_blank(),
         strip.text.y = element_blank(),
         axis.title.x = element_blank(), 
         axis.title.y = element_blank(),
         legend.position = 'bottom',
         legend.title = element_text(size=10, face="bold"), 
         legend.text = element_text(size=10, face="bold"), 
         legend.box = "vertical",
         legend.box.background =  element_rect(colour = "black"),
         strip.background = element_blank(),
         strip.text = element_text(size=12, face="bold"),
         panel.grid.major.y = element_line(linetype = 2, size=.05)
        )

4.6 Test models using the IgASeq data

5 Use multi-omics models based on mixOmics package